Skip to main content

Reply to this post | See parent post | Go Back
View Post [edit]

Poster: dingua Date: Jan 24, 2016 9:51pm
Forum: texts Subject: Re: How to get the all list of subjects of one search

thanks for the reply,
one more question: if i need to take all subjects/languages for all searched items..do i need to set high number to "rows" parameter, like this:

https://archive.org/advancedsearch.php?q=test&;;fl%5B%5D=description&fl%5B%5D=language&fl%5B%5D=subject&sort%5B%5D=&sort%5B%5D=&sort%5B%5D=&rows=999999&page=1&output=json&callback=callback&save=yes

And i still believe that for archive website they are not doing this to get all topics as this takes too much time to load.

This post was modified by dingua on 2016-01-25 05:51:36

Reply to this post
Reply [edit]

Poster: stbalbach Date: Jan 25, 2016 6:01pm
Forum: texts Subject: Re: How to get the all list of subjects of one search

Hi,

Rows is the number of results per page. The &page parameter specifies which page to return. To get all results, find total number of results (the numFound field in the JSON header data), divide by 50 (or whatever &rows is) and reiterate through the pages until done. If there are 2000 results, there are 400 pages of 50 results each:

&rows=50&page=1
&rows=50&page=2
...
&rows=50&page=400

A rows of 50 or 100 is probably a friendly number not to load the server. Also means you can examine the input and quit early if the quality of search results gets worse towards the end.

Stephen

Reply to this post
Reply [edit]

Poster: dingua Date: Jan 25, 2016 8:24pm
Forum: texts Subject: Re: How to get the all list of subjects of one search

Hi,
I think i did not express myself well enough.
What i need to get exactly is this list :
https://copy.com/LmGV4cxeYZSqduSu

The website shows this list from the first loaded page.
For example for American Library the list present all topics in the collection (not only on the pages loaded). Which means that they have another request independently of the one which loads pages that gives the list of topics (subjects), languages and collections.

For now i have in mind only one solution (based on your first answer) which is consists on loading all items of one search with selecting only the fields i need (language, subject, collection) and in the client side i parse the json counting all subjects,languages and collections present in the response. But this will take too much time!
So my next question, is there any simple solution which gives the set of subjects of one search, like this web service which gives number of item types archived:
https://archive.org/index.php?output=json&;callback=IAE.topfn

Thanks!

Reply to this post
Reply [edit]

Poster: stbalbach Date: Jan 25, 2016 9:21pm
Forum: texts Subject: Re: How to get the all list of subjects of one search

It sounds like you want the number of works? Here is the JSON of a search on "William". It has:

"numFound":285934

There are 285934 works containing "William".

Reply to this post
Reply [edit]

Poster: dingua Date: Jan 25, 2016 9:42pm
Forum: texts Subject: Re: How to get the all list of subjects of one search

No, i need the set of subjects of one search .

Reply to this post
Reply [edit]

Poster: stbalbach Date: Jan 26, 2016 5:02pm
Forum: texts Subject: Re: How to get the all list of subjects of one search

See post here. Can't get them all in "one search", have to walk through it one piece at a time using a script.

1. Download the search with Rows = 50
2. Get the NumFound value
3. Divide NumFound / Rows = pages
4. For (i = 0; while i < pages; i++)
get page i

Where "get page i" is the advanced search URL with the requested field set to "subject", rows=50 and page=i



This post was modified by stbalbach on 2016-01-27 01:02:24